Topic Sensitive SourceRank: Extending SourceRank for Performing Context-Sensitive Search over Deep Web by

نویسندگان

  • Manishkumar Jha
  • Hasan Davulcu
  • Huan Liu
  • Subbarao Kambhampati
چکیده

Source selection is one of the foremost challenges for searching deep-web. For a user query, source selection involves selecting a subset of deep-web sources expected to provide relevant answers to the user query. Existing source selection models employ query-similarity based local measures for assessing source quality. These local measures are necessary but not sufficient as they are agnostic to source trustworthiness and result importance, which, given the autonomous and uncurated nature of deep-web, have become indispensible for searching deep-web. SourceRank provides a global measure for assessing source quality based on source trustworthiness and result importance. SourceRank’s effectiveness has been evaluated in single-topic deep-web environments. The goal of the thesis is to extend sourcerank to a multi-topic deep-web environment. Topic-sensitive sourcerank is introduced as an effective way of extending sourcerank to a deep-web environment containing a set of representative topics. In topic-sensitive sourcerank, multiple sourcerank vectors are created, each biased towards a representative topic. At query time, using the topic of query keywords, a query-topic sensitive, composite sourcerank vector is computed as a linear combination of these pre-computed biased sourcerank vectors. Extensive experiments on more than a thousand sources in multiple domains show 18-85% improvements in result quality over Google Product Search and other existing methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Agreement Based Source Selection for the Multi-Domain Deep Web Integration

One immediate challenge in searching the deep web databases is source selection—i.e. selecting the most relevant web databases for answering a given query. For open collections like the deep web, the source selection must be sensitive to trustworthiness and importance of sources. Recent advances solve these problems for a single topic deep web search adapting an agreement based approach (c.f. S...

متن کامل

A Assessing Relevance and Trust of the Deep Web Sources and Results Based on Inter-Source Agreement1

Deep web search engines face the formidable challenge of retrieving high quality results from the vast collection of searchable databases. Deep web search is a two step process of selecting the high quality sources and ranking the results from the selected sources. Though there are existing methods for both the steps, they assess the relevance of the sources and the results using the query-resu...

متن کامل

A Distributed P2P Link Analysis Based Ranking System

Link Based approaches are among the most popular ranking approaches employed by search engines. They make use of the inherent linkage based structure of World Wide Web documents assigning each document an importance score. This importance score is based on the incoming links for a document; a document which is pointed to by many high quality documents should have a higher importance score. Goog...

متن کامل

Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative “importance” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accuratel...

متن کامل

Criteria for Cluster-Based Personalized Search

We study personalized web ranking algorithms based on the existence of document clusterings. Motivated by the topic sensitive page ranking of Haveliwala [20], we develop and implement an efficient “local-cluster” algorithm by extending the web search algorithm of Achlioptas, Fiat, Karlin and McSherry [10]. We propose some formal criteria for evaluating such personalized ranking algorithms and p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011